Building An RSS Scraper How do I get my data into a database

by: nickduddy, 7 years ago

Last edited: 7 years ago

Below is the code I've put together to gather and store the data in a pandas dataframe. What I'm trying to understand next is:

How to pass the data gathered from the feeds into a database?
Where in the script should I code Advanced Python Scheduler?

<pre class='prettyprint lang-py'>
import feedparser
import pandas as pd
import time

rawrss = ['http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml',
          'https://www.yahoo.com/news/rss/',
          'http://www.huffingtonpost.co.uk/feeds/index.xml',
          'http://feeds.feedburner.com/TechCrunch/',
         ]

posts = []
for url in rawrss:
    feed = feedparser.parse(url)
    for post in feed.entries:
        posts.append((post.title, post.link, post.summary))
df = pd.DataFrame(posts, columns=['title', 'link', 'summary']) # pass data to init
</pre>



You must be logged in to post. Please login or register an account.



Well, if it's in pandas, you can do a .to_sql, to either sqlite or mysql.

You can check out sqlite here: https://pythonprogramming.net/sql-database-python-part-1-inserting-database/

MySQL here: https://pythonprogramming.net/mysql-intro/   (which was python2, for python3, use "PyMySQL" instead)

-Harrison 7 years ago

You must be logged in to post. Please login or register an account.